[134]简报:大数据技术动态 - 20170307






ML & DL & AI & RL

Voice from Facebook: Using Apache Spark for Large-Scale Language Model Training
DATALAKE 3.0 PART 3 – DISTRIBUTED TENSORFLOW ASSEMBLY ON APACHE HADOOP YARN
Machine Learning in the Age of Big Data

Spark

Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1
Processing a Trillion Rows Per Second on a Single Machine: How Can Nested Loop Joins be this Fast?
Achieving a 300% speedup in ETL with Apache Spark

SQL & Real-Time Analytics on Hadoop

Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem
Apache Kudu: Top Use Cases for Real-Time Analytics
How To Set Up a Shared Amazon RDS as Your Hive Metastore
Latest Impala Cookbook

Hadoop

DATA LAKE 3.0 PART 2 – A MULTI-COLORED YARN
DATA LAKE 3.0: THE EZ BUTTON TO DEPLOY IN MINUTES AND CUT TCO BY HALF
DATA LAKE 3.0: THE EZ BUTTON TO DEPLOY IN MINUTES AND CUT TCO BY HALF
How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop
Apache Hadoop 3.0.0-alpha2 Released
Untangling Apache Hadoop YARN, Part 5: Using FairScheduler queue properties
HDFS DataNode Scanners and Disk Checker Explained
How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 1
How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

Data Ingestion

New in Cloudera Enterprise 5.8: Flafka Improvements for Real-Time Data Ingest

Testing & Evaluating

YCSB 0.10.0 Now in Cloudera Labs
Quality Assurance at Cloudera: Highly-Controlled Disk Injection